Mapping Streaming Applications to OpenCL
نویسنده
چکیده
Graphic processing units (GPUs) have been gaining popularity in general purpose and high performance computing. A GPU is made up of a number of streaming multiprocessors (SM), each of which consists of many processing cores. A large number of general-purpose applications have been mapped onto GPUs efficiently. Stream processing applications, however, exhibit properties such as unfavorable data movement patterns and low computation-tocommunication ratio that might lead to a poor performance on a GPU. We describe the automated mapping framework developed earlier that maps most stream processing applications onto NVIDIA GPUs efficiently by taking into account its architectural characteristics. We then discuss the implementation details of porting the mapping framework to OpenCL running on AMD GPUs and evaluate the performance of the mapping framework by running several benchmarks. Performance between the generated CUDA and OpenCL code is compared based on different heuristics.
منابع مشابه
Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB
In this paper we evaluate the performance and energy effectiveness of FPGA and CPU devices for a kind of parallel computing applications in which the workload can be distributed in a way that enables simultaneous computing in addition to simple off loading. The FPGA device is programmed via OpenCL using the recent availability of commercial tools and hardware while Threading Building Blocks (TB...
متن کاملMethods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures
Heterogeneous multicore architectures with CPU and add-on GPUs or streaming processors are now widely used in computer systems. These GPUs provide substantially more computation capability and memory bandwidth compared to traditional multi-cores. Also, because they are highly programmable, they provide the computational performance needed for realistic graphics rendering. Applications with gene...
متن کاملOperating System Support for Fine-grained Pipeline Parallelism on Heterogeneous Multicore Accelerators
On-chip special-purpose accelerators are a promising technique in the achievement of high-performance and energy-efficient computing. In particular, fine-grained pipelined execution with multicore accelerators is suitable for streaming applications such as JPEG decoders, which consist of a series of different tasks and process streaming data. CPUs that assign each task to appropriate accelerato...
متن کاملComparison of OpenMP & OpenCL Parallel Processing Technologies
This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing all available cores and allocating suff...
متن کاملAdas on Cots with OpenCL: A Case Study with Lane Detection
The concept of autonomous cars is driving a boost for car electronics and the size of automotive semiconductor market is foreseen to double by 2025. How to benefit from this boost is an interesting question. This article presents a case study to test the feasibility of using OpenCL as the programming language and COTS components as the underlying platforms for ADAS development. For representati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012